AITopics | detect outlier

Joints in Random Forests

Neural Information Processing SystemsDec-24-2025, 06:07:01 GMT

Decision Trees (DTs) and Random Forests (RFs) are powerful discriminative learners and tools of central importance to the everyday machine learning practitioner and data scientist. Due to their discriminative nature, however, they lack principled methods to process inputs with missing features or to detect outliers, which requires pairing them with imputation techniques or a separate generative model. In this paper, we demonstrate that DTs and RFs can naturally be interpreted as generative models, by drawing a connection to Probabilistic Circuits, a prominent class of tractable probabilistic models. This reinterpretation equips them with a full joint distribution over the feature space and leads to Generative Decision Trees (GeDTs) and Generative Forests (GeFs), a family of novel hybrid generative-discriminative models. This family of models retains the overall characteristics of DTs and RFs while additionally being able to handle missing features by means of marginalisation. Under certain assumptions, frequently made for Bayes consistency results, we show that consistency in GeDTs and GeFs extend to any pattern of missing input features, if missing at random. Empirically, we show that our models often outperform common routines to treat missing data, such as K-nearest neighbour imputation, and moreover, that our models can naturally detect outliers by monitoring the marginal probability of input features.

joint, name change, random forest, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.86)

Add feedback

OpenMatch: Open-set Consistency Regularization for Semi-supervised Learning with Outliers

Neural Information Processing SystemsAug-17-2025, 19:38:27 GMT

The OV A-classifier outputs the confidence score of a sample being an inlier, providing a threshold to detect outliers.

artificial intelligence, data mining, machine learning, (17 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.54)

Add feedback

Joints in Random Forests

Neural Information Processing SystemsMay-27-2025, 04:41:59 GMT

Decision Trees (DTs) and Random Forests (RFs) are powerful discriminative learners and tools of central importance to the everyday machine learning practitioner and data scientist. Due to their discriminative nature, however, they lack principled methods to process inputs with missing features or to detect outliers, which requires pairing them with imputation techniques or a separate generative model. In this paper, we demonstrate that DTs and RFs can naturally be interpreted as generative models, by drawing a connection to Probabilistic Circuits, a prominent class of tractable probabilistic models. This reinterpretation equips them with a full joint distribution over the feature space and leads to Generative Decision Trees (GeDTs) and Generative Forests (GeFs), a family of novel hybrid generative-discriminative models. This family of models retains the overall characteristics of DTs and RFs while additionally being able to handle missing features by means of marginalisation.

artificial intelligence, machine learning, random forest, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Detecting outliers by clustering algorithms

Li, Qi, Wang, Shuliang

arXiv.org Machine LearningDec-7-2024

Clustering and outlier detection are two important tasks in data mining. Outliers frequently interfere with clustering algorithms to determine the similarity between objects, resulting in unreliable clustering results. Currently, only a few clustering algorithms (e.g., DBSCAN) have the ability to detect outliers to eliminate interference. For other clustering algorithms, it is tedious to introduce another outlier detection task to eliminate outliers before each clustering process. Obviously, how to equip more clustering algorithms with outlier detection ability is very meaningful. Although a common strategy allows clustering algorithms to detect outliers based on the distance between objects and clusters, it is contradictory to improving the performance of clustering algorithms on the datasets with outliers. In this paper, we propose a novel outlier detection approach, called ODAR, for clustering. ODAR maps outliers and normal objects into two separated clusters by feature transformation. As a result, any clustering algorithm can detect outliers by identifying clusters. Experiments show that ODAR is robust to diverse datasets. Compared with baseline methods, the clustering algorithms achieve the best on 7 out of 10 datasets with the help of ODAR, with at least 5% improvement in accuracy.

algorithm, odar, outlier, (16 more...)

arXiv.org Machine Learning

2412.05669

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Joints in Random Forests

Neural Information Processing SystemsOct-10-2024, 16:15:17 GMT

Decision Trees (DTs) and Random Forests (RFs) are powerful discriminative learners and tools of central importance to the everyday machine learning practitioner and data scientist. Due to their discriminative nature, however, they lack principled methods to process inputs with missing features or to detect outliers, which requires pairing them with imputation techniques or a separate generative model. In this paper, we demonstrate that DTs and RFs can naturally be interpreted as generative models, by drawing a connection to Probabilistic Circuits, a prominent class of tractable probabilistic models. This reinterpretation equips them with a full joint distribution over the feature space and leads to Generative Decision Trees (GeDTs) and Generative Forests (GeFs), a family of novel hybrid generative-discriminative models. This family of models retains the overall characteristics of DTs and RFs while additionally being able to handle missing features by means of marginalisation.

detect outlier, generative model, random forest, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

ODIM: an efficient method to detect outliers via inlier-memorization effect of deep generative models

Kim, Dongha, Hwang, Jaesung, Lee, Jongjin, Kim, Kunwoong, Kim, Yongdai

arXiv.org Artificial IntelligenceJan-10-2023

Identifying whether a given sample is an outlier or not is an important issue in various real-world domains. This study aims to solve the unsupervised outlier detection problem where training data contain outliers, but any label information about inliers and outliers is not given. We propose a powerful and efficient learning framework to identify outliers in a training data set using deep neural networks. We start with a new observation called the inlier-memorization (IM) effect. When we train a deep generative model with data contaminated with outliers, the model first memorizes inliers before outliers. Exploiting this finding, we develop a new method called the outlier detection via the IM effect (ODIM). The ODIM only requires a few updates; thus, it is computationally efficient, tens of times faster than other deep-learning-based algorithms. Also, the ODIM filters out outliers successfully, regardless of the types of data, such as tabular, image, and sequential. We empirically demonstrate the superiority and efficiency of the ODIM by analyzing 20 data sets.

artificial intelligence, machine learning, outlier, (18 more...)

arXiv.org Artificial Intelligence

2301.04257

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > Puerto Rico > San Juan > San Juan (0.04)
(3 more...)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.63)

Add feedback

Outlier Detection Techniques in Python

#artificialintelligenceAug-26-2022, 16:14:08 GMT

Outlier detection, which is the process of identifying extreme values in data, has many applications across a wide variety of industries including finance, insurance, cybersecurity and healthcare. In finance, for example, it can detect malicious events like credit card fraud. In insurance, it can identify forged or fabricated documents. In cybersecurity, it is used for identifying malicious behaviors like password theft and phishing. Finally, outlier detection has been used for rare disease detection in a healthcare context.

iqr, outlier, outlier detection, (15 more...)

#artificialintelligence

Country: North America > United States (0.05)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.50)

Add feedback

What is an Isolation Forest? And How Does it Detect Outliers?

#artificialintelligenceMar-24-2022, 07:15:56 GMT

Isolation Forest is a simple yet incredible unsupervised algorithm that is able to spot outliers or anomalies in a data set very quickly. I should say understanding this tool is a must for any aspiring data scientist. In this article, I will briefly go through the theories behind the algorithm and also its implementation. Its Python implementation from Scitkit Learn has been gaining tons of popularity due to its capabilities and ease of use. But before we jump right into the implementation, it's always best practice for us to study about its use cases and the theory behind it.

anomaly, isolation forest, outlier, (15 more...)

#artificialintelligence

Country: Oceania > Australia > Victoria (0.05)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.49)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.48)

Add feedback

4 Machine learning techniques for outlier detection in Python

#artificialintelligenceMar-28-2021, 07:35:05 GMT

Based on the feedback given by readers after publishing "Two outlier detection techniques you should know in 2021", I have decided to make this post which includes four different machine learning techniques (algorithms) for outlier detection in Python. Here, I will use the I-I (Intuition-Implementation) approach for each technique. That will help you to understand how each algorithm works behind the scenes without going deeper into the algorithm mathematics (the Intuition part) and implement each algorithm with the Scikit-learn machine learning library (the Implementation part). I will also use some graphical techniques to describe each algorithm and its output. At the end of this article, I will write the "Key Takeaways" section which will include some special strategies for using and combining the four techniques.

detection, novelty detection, outlier detection, (12 more...)

#artificialintelligence

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Detecting abnormalities in resting-state dynamics: An unsupervised learning approach

Khosla, Meenakshi, Jamison, Keith, Kuceyeski, Amy, Sabuncu, Mert R.

arXiv.org Machine LearningAug-16-2019

Much of the research in this direction has aimed at identifying connectivity based biomarkers, restricting the analysis to so-called "static" functional connectivity measures that quantify the average degree of synchrony between brain regions. For e.g., machine learning based strategies have been used with static connectivity measures to parcellate the brain into functional networks, and extract individual-level predictions about cognitive state or clinical condition [2]. In recent years, there has been a surge in the study of the temporal dynamics of rsfMRI data, offering a complementary perspective on the functional connectome and how it is altered in disease, development, and aging [14]. However, to our knowledge, there has been a dearth of machine learning applications to dynamic rsfMRI analysis. Thanks to large-scale datasets, modern machine learning methods have fueled significant progress in computer vision. Compared to natural vision applications, however, medical imaging poses a unique set of challenges. Data, particularly labeled data, are often scarce in medical imaging applications. This makes data-hungry methods such as supervised CNNs possibly less useful. One potential approach to tackle the limited sample size issue is to exploit unsupervised arXiv:1908.06168v1

artificial intelligence, machine learning, sequence, (19 more...)

arXiv.org Machine Learning

1908.06168

Genre: Research Report > Experimental Study (0.69)

Industry: